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The acquisition and representation of basic experimental 
information under the probabilistic paradigm is analysed. 
The multinomial probability distribution is identified as gov- 
erning all scientific data collection, at least in principle. For 
this distribution there exist unique random variables, whose 
standard deviation becomes asymptotically invariant of phys- 
ical conditions. Representing all information by means of such 
random variables gives the quantum mechanical probability 
amplitude and a real alternative. For predictions, the linear 
evolution law (Schrodinger or Dirac equation) turns out to be 
the only way to extend the invariance property of the stan- 
dard deviation to the predicted quantities. This indicates that 
quantum theory originates in the structure of gaining pure, 
probabilistic information, without any mechanical underpin- 
ning. 

I. INTRODUCTION 

The probabilistic paradigm proposed by Born is well 
accepted for comparing experimental results to quantum 
theoretical predictions Jl| . It states that only the proba- 
bilities of the outcomes of an observation are determined 
by the experimental conditions. In this paper we wish to 
place this paradigm first. We shall investigate its conse- 
quences without assuming quantum theory or any other 
physical theory. We look at this paradigm as defining the 
method of the investigation of nature. This consists in 
the collection of information in probabilistic experiments 
performed under well controlled conditions, and in the ef- 
ficient representation of this information. Realising that 
the empirical information is necessarily finite permits to 
put limits on what can at best be extracted from this 
information and therefore also on what can at best be 
said about the outcomes of future experiments. At first, 
this has nothing to do with laws of nature. But it tells 
us how optimal laws look like under probability. Inter- 
estingly, the quantum mechanical probability calculus is 
found as almost the best possibility. It meets with dif- 
ficulties only when it must make predictions from a low 
amount of input information. We find that the quantum 
mechanical way of prediction does nothing but take the 
initial uncertainty volume of the representation space of 
the finite input information and move this volume about, 
without compressing or expanding it. However, we em- 
phasize, that any mechanistic imagery of particles, waves, 
fields, even space, must be seen as what they are: The hu- 



man brain's way of portraying sensory impressions, mere 
images in our minds. Taking them as corresponding to 
anything in nature, while going a long way in the design 
of experiments, can become very counter productive to 
science's task of finding laws. Here, the search for invari- 
ant structures in the empirical information, without any 
models, seems to be the correct path. Once embarked 
on this road, the old question of how nature really is, no 
longer seeks an answer in the muscular domain of mass, 
force, torque, and the like, which classical physics took 
as such unshakeable primary notions (not surprisingly, 
considering our ape nature, I cannot help commenting). 
Rather, one asks: Which of the structures principally 
detectable in probabilistic information, are actually real- 
ized? 

In the following sections we shall analyse the process of 
scientific investigation of nature under the probabilistic 
paradigm. We shall first look at how we gain informa- 
tion, then how we should best capture this information 
into numbers, and finally, what the ideal laws for making 
predictions should look like. The last step will bring the 
quantum mechanical time evolution, but will also indi- 
cate a problem due to finite information. 

II. GAINING EXPERIMENTAL INFORMATION 

Under the probabilistic paradigm basic physical obser- 
vation is not very different from tossing a coin or blindly 
picking balls from an urn. One sets up specific conditions 
and checks what happens. And then one repeats this 
many times to gather statistically significant amounts 
of information. The difference to classical probabilistic 
experiments is that in quantum experiments one must 
carefully monitor the conditions and ensure they are the 
same for each trial. Any noticeable change constitutes a 
different experimental situation and must be avoided.^] 

Formally, one has a probabilistic experiment in which 
a single trial can give K different outcomes, one of which 



1 Strictly speaking, identical trials are impossible. A deeper 
analysis of why one can neglect remote conditions, might lead 
to an understanding of the notion of spatial distance, about 
which relativity says nothing, and which is badly missing in 
todays physics. 
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happens. The probabilities of these outcomes, pi, ...,pk , 
(%2Pj = 1), are determined by the conditions. But 
they are unknown. In order to find their values, and 
thereby the values of physical quantities functionally re- 
lated to them, one does N trials. Let us assume the 
outcomes j — l,...,K happen Li,...,Lk times, respec- 
tively (Y)Lj = N). The Lj are random variables, sub- 
ject to the multinomial probability distribution. Listing 
L\, Lk represents the complete information gained in 
the N trials. The customary way of representing the in- 
formation is however by other random variables, the so 
called relative frequencies Vj = Lj/N. Clearly, they also 
obey the multinomial probability distribution. 
Examples: 

* A trial in a spin- 1/2 Stern-Gerlach experiment has two 
possible outcomes. This experiment is therefore goverend 
by the binomial probability distribution. 

* A trial in a GHZ experiment has eight possible out- 
comes, because each of the three particles can end up in 
one of two detectors Here, the relative frequencies 
follow the multinomial distribution of order eight. 

* Measuring an intensity in a detector, which can only fire 
or not fire, is in fact an experiment where one repeatedly 
checks whether a firing occurs in a sufficiently small time 
interval. Thus one has a binomial experiment. If the 
rate of firing is small, the binomial distribution can be 
approximated by the Poisson distribution. 

We must emphasize that the multinomial probability 
distribution is of utmost importance to physics under 
the probabilistic paradigm. This can be seen as fol- 
lows: The conditions of a probabilistic experiment must 
be verified by auxiliary measurements. These are usu- 
ally coarse classical measurements, but should actually 
also be probabilistic experiments of the most exacting 
standards. The probabilistic experiment of interest must 
therefore be done by ensuring that for each of its trials 
the probabilities of the outcomes of the auxiliary prob- 
abilistic experiments are the same. Consequently, em- 
pirical science is characterized by a succession of data- 
takings of multinomial probability distributions of var- 
ious orders. The laws of physics are contained in the 
relations between the random variables from these differ- 
ent experiments. Since the statistical verification of these 
laws is again ruled by the properties of the multinomial 
probability distribution, we should expect that the in- 
ner structure of the multinomial probability distribution 
will appear in one form or another in the fundamental 
laws of physics. In fact, we might be led to the bold 
conjecture that, under the probabilistic paradigm, basic 
physical law is no more than the structures implicit in 
the multinomial probability distribution. There is no es- 
cape from this distribution. Whichever way we turn, we 
stumble across it as the unavoidable tool for connecting 
empirical data to physical ideas. 

The multinomial probability distribution of order K 
is obtained when calculating the probability that, in N 
trials, the outcomes 1,...,K occur L\, Lk times, respec- 
tively §: 



Prob(L u ...,L K \N,pi, ...,pk) 



p\\.. V L K K . (1) 
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The expectation values of the relative frequencies are 

vj = Pj (2) 
and their standard deviations arc 



N 



(3) 



III. EFFICIENT REPRESENTATION OF 
PROBABILISTIC INFORMATION 

The reason why probabilistic information is most of- 
ten represented by the relative frequencies Vj seems to be 
history: Probability theory has originated as a method 
of estimating fractions of countable sets, when inspecting 
all elements was not possible (good versus bad apples in 
a large plantation, desirable versus undesirable outcomes 
in games of chance, etc.). The relative frequencies and 
their limits were the obvious entities to work with. But 
the information can be represented equally well by other 
random variables Xj > as long as these are one-to-one map- 
pings Xj{ v j)i so that no information is lost. The question 
is, whether there exists a most efficient representation. 

To answer this, let us see what we know about the 
limits pi,...,pk before the experiment, but having de- 
cided to do N trials. Our analysis is equivalent for all 
K outcomes, so that we can pick out one and drop the 
subscript. We can use Chebyshev's inequality Q to esti- 
mate the width of the interval, to which the probability 
p of the chosen outcome is pinned down.^ 

If TV is not too small, we get 



2k 



N 



(4) 



where k is a free confidence parameter. (Eq.(4) is not 
valid at u=0 or 1.) Before the experiment we do not 
know v, so we can only give the upper limit, 



k_ 

1v 



(5) 



But we can be much more specific about the limit x of 
the random variable x( v )i f° r which we require that, at 



2 Chebyshev's inequality states: For any random variable, 
whose standard deviation exists, the probability that the 
value of the random variable deviates by more than k stan- 
dard deviations from its expectation value is less than, or 
equal to, k~ 2 . Here, A; is a free confidence parameter greater 
1. 
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least for large N, the standard deviation Ax shall be 
independent of p (or of x for that matter, since there will 
exist a function p(x)), 



Ax 



C_ 

3' 



(6) 



where C is an arbitrary real constant. A straightforward 
analysis reveals 



X = C arcsin (2v — 1) 



(7) 



where 9 is an arbitrary real constant. || For comparison 
with v we confine x to [0,1] and thus set C = it -1 and 9 = 
.5. Then we have Ax = 1/(tt\/~N), and upon application 
of Chebyshev's inequality we get the interval w x to which 
we can pin down the unknown limit x as 



2 k 

u 'x = 7^- 



(8) 



Clearly, this is narrower than the upper limit for w p in 
eq.(5). Having done no experiment at all, we have better 
knowledge on the value of x than on the value of p, al- 
though both can only be in the interval [0,1]. And note 
that, the actual experimental data will add nothing to the 
accuracy with which we know x, but they may add to the 
accuracy with which we know p. Nevertheless, even with 
data, w p may still be larger than w x , especially when p is 
around 0.5. Figure 1 shows the relation of v and x> and 
how the probability distribution of v for various values 
of p gets squeezed and stretched when plotting it for x- 

For the representation of information the random vari- 
able x is th- e proper choice, because it disentangles the 
two aspects of empirical information: The number of tri- 
als N, which is determined by the experimenter, not by 
nature, and the actual data, which are only determined 
by nature. The experimenter controls the accuracy w x 
by deciding N , nature supplies the data x, and thereby 
the whereabouts of x. In the real domain the only other 
random variables with this property are the linear trans- 
formations afforded by C and 9. From the physical point 
of view x is of interest, because its standard deviation is 
an invariant of the physical conditions as contained in p 
or x. The random variable x expresses empirical infor- 
mation with a certain efficiency, eliminating a numerical 
distortion that is due to the structure of the multinomial 
distribution, and which is apparent in all other random 
variables. We shall call x an efficient random variable 
(ER). More generally, we shall call any random variable 
an ER, whose standard deviation is asymptotically in- 
variant of the limit the random variable tends to, eq.(6). 

A graphical depiction of the relation between v and 
X can be given by drawing a semicircle of diameter 1 
along which we plot v (Fig. 2a). By orthogonal pro- 
jection onto the semicircle we get the random variable 
£ = [tt + 2arcsin(2zy — l)]/4 and thereby %, when we 
choose different constants. The drawing also suggests a 
simple way how to obtain a complex ER. We scale the 



semicircle by an arbitrary real factor a, tilt it by an ar- 
bitrary angle <p, and place it into the complex plane as 
shown in Fig. 2b. This gives the random variable 



(3 = a (Vj/(1 - v) + ivj e~ lip + , 



(9) 



where b is an arbitrary complex constant. We get a very 
familiar special case by setting a = 1 and b = 0: 
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For large N the probability distribution of v becomes 
gaussian, but also that of any smooth function of v 1 as 
can be seen in Fig.l. Therefore the standard deviation 
of ip is obtained as 
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Obviously, the random variable ip is an ER. It fulfills 
| tp\ 2 = v, and we recognize it as the probability ampli- 
tude of quantum theory, which we would infer from the 
observed relative frequency v. Note, however, that the 
intuitive way of getting the quantum mechanical proba- 
bility amplitude, namely, by simply taking -y/F/ exp(ia), 
where a is an arbitrary phase, does not give us an ER. 

We have now two ways of representing the obtained in- 
formation by ERs, either the real valued x or the complex 
valued /?. Since the relative frequency of each of the K 
outcomes can be converted to its respective efficient ran- 
dom variable, the result of a general probabilistic exper- 
iment is efficiently represented by the vector {xii---iXk)i 
or by the vector {(3\,...,P3k)- The latter is equivalent to 
the quantum mechanical state vector, if we normalize it: 
(ip 1 ,...,ipK)- 

At this point it is not clear, whether fundamental sci- 
ence could be built solely on the real ERs Xj or whether 
it must rely on the complex ERs f3j, and for practical 
reasons on the normalized case ipj, as suggested by cur- 
rent formulations of quantum theory. We cannot address 
this problem here, but mention that working with the p3j 
or ipj can lead to nonsensical predictions, while working 
with the Xj never does, so that the former are more sen- 
sitive to inconsistencies in the input data |6). Therefore 
we use only the ipj in the next section, but will not read 
them as if we were doing quantum theory. 



IV. PREDICTIONS 

Let us now see whether the representation of proba- 
bilistic information by ERs suggests specific laws for pre- 
dictions. A prediction is a statement on the expected 
values of the probabilities of the different outcomes of a 
probabilistic experiment, which has not yet been done, or 
whose data we just do not yet know, on the basis of aux- 
iliary probabilistic experiments, which have been done, 
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and whose data we do know. We intend to make a pre- 
diction for a probabilistic experiment with Z outcomes, 
and wish to calculate the quantities <p s , (s = 1,...,Z), 
which shall be related to the predicted probabilities P s 
as P s — \(f> s \ 2 - We do not presuppose that the (p s are 
ERs. 

We assume we have done M different auxiliary prob- 
abilistic experiments of various multinomial order K m , 
m = 1, M, and we think that they provided all the in- 
put information needed to predict the <p s , and therefore 
the P s . With (10) the obtained information is repre- 
sented by the ERs ip™, where m denotes the experiment 
and j labels a possible outcome in it (j = l,...,K m ). 
Then the predictions are 



M 



(12) 



and their standard deviations are, by the usual convolu- 
tion of gaussians as approximations of the multinomial 
distributions, 



A*, = 



\ 
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E 



4A„ 



d(j) s 



dip* 



(13) 



where N m is the number of trials of the m th auxiliary ex- 
periment. If we wish the <p s to be ERs, we must demand 
that the A<f) s depend only on the N m . (A technical re- 
quirement is that in each of the M auxiliary experiments 
one of the phases of ERs ip™ cannot be chosen freely, 
otherwise the second summations in (13) could not go 
to K m , but only to K m — 1.) Then the derivatives in 
(13) must be constants, implying that the (p s are linear 
in the ipj 1 . However, we cannot simply assume such lin- 
earity, because (12) contains the laws of physics, which 
cannot be known a priori. But we want to point out 
that a linear relation for (12) has very exceptional prop- 
erties, so that it would be nice, if we found it realized in 
nature. To be specific, if the N m are sufficiently large, 
linearity would afford predictive power, which no other 
functional relation could achieve: It would be sufficient 
to know the number of trials of each auxiliary probabilis- 
tic experiment in order to specify the accuracy of the 
predicted <f> 8 . No data would be needed, only a decision 
how many trials each auxiliary experiment will be given! 
Moreover, even the slightest increase of the amount of 
input information, by only doing one more trial in any of 
the auxiliary experiments, would lead to better accuracy 
of the predicted <p s , by bringing a definite decrease of the 
A<fi s . This latter property is absent in virtually all other 
functional relations conceivable for (12). In fact, most 
nonlinear relations would allow more input information 
to result in less accurate predictions. This would un- 
dermine the very idea of empirical science, namely that, 
by observation our knowledge about nature can only in- 
crease, never just stay the same, let alone decrease. For 
this reason we assume linearity and apply it to a concrete 
example. 



We take a particle in a one dimensional box of width 
w. Alice repeatedly prepares the particle in a state only 
she knows. At time t after the preparation Bob measures 
the position by subdividing the box into K bins of width 
w/K and checking in which he finds the particle. In 
N trials Bob obtains the relative frequencies v\,...,vk, 
giving a good idea of the particle's position probability 
distribution at time t. He represents this information by 
the ERs ipj of (10) and wants to use it to predict the 
position probability distribution at time T (T > t). 

First he predicts for t + dt. With (12) the predicted <j> 8 
must be linear in the ipj if they are to be ERs, 



K 



,[t + dt) = y^^sjipj- 



Clearly, when dt — > we must have a 
a S j = otherwise, so we can write 



a s j(t) = S sj +g sj (t)dt, 



(14) 

1 for s = j and 
(15) 



where g S j (t) are the complex elements of a matrix G and 
we included the possibility that they depend on t. Using 
matrix notation and writing the (p s and ipj as column 
vectors we have 



(j){t + dt) = [1 + G{t)dt]ip. 



(16) 



For a prediction for time t + 2dt we must apply another 
such linear transformation to the prediction we had for 
t + dt, 



<j>(t + 2dt) = [1 + G(t + dt)dt] $(t + dt). 

Replacing t + dt by t, and using (f>(t + dt) = <f>{t) - 
we have 



(17) 



9<P(t) 
dt 



dt, 



d$(t) 
dt 



= G(i)0(t). 



(18) 



With (10) the input vector was normalized, \tp\ 2 = 1. 
We also demand this from the vector (f>. This results 
in the constraint that the diagonal elements g ss must 
be imaginary and the off-diagonal elements must fulfill 
9s j = ~9j s - And then we have obviously an evolution 
equation just as we know it from quantum theory. 

For a quantitative prediction we need to know G(t) 
and the phases ifj of the initial ipj . We had assumed the 
ipj to be arbitrary. But now we see that they influence 
the prediction, and therefore they attain physical signifi- 
cance. G(t) is a unitary complex KxK matrix. For fixed 
conditions it is independent of time, and with the proper- 
ties found above, it is given by K 2 — 1 real numbers. The 
initial vector ip has K complex components. It is normal- 
ized and one phase is free, so that it is fixed by 2K — 2 
real numbers. Altogether K 2 + 2K - 3 = (K + 3) (if - 1) 
numbers are needed to enable prediction. Since one prob- 
abilistic experiment yields K — 1 numbers, Bob must do 



4 



K +3 probabilistic experiments with different delay times 
between Alice's preparation and his measurement to ob- 
tain sufficient input information. But neither Planck's 
constant nor the particle's mass arc needed. It should be 
noted that this analysis remains unaltered, if the initial 
vector ip is obtained from measurement of joint probabil- 
ity distributions of several particles. Therefore, (18) also 
contains entanglement between particles. 

V. DISCUSSION 

This paper was based on the insight that under the 
probabilistic paradigm data from observations are sub- 
ject to the multinomial probability distribution. For the 
representation of the empirical information we searched 
for random variables which are stripped of numerical 
artefacts. They should therefore have an invariance prop- 
erty. We found as unique random variables a real and a 
complex class of efficient random variables (ERs). They 
capture the obtained information more efficiently than 
others, because their standard deviation is an asymptotic 
invariant of the physical conditions. The quantum me- 
chanical probability amplitude is the normalized case of 
the complex class. It is natural that fundamental prob- 
abilistic science should use such random variables rather 
than any others as the representors of the observed in- 
formation, and therefore as the carriers of meaning. 

Using the ERs for prediction has given us an evolution 
prescription which is equivalent to the quantum theoret- 
ical way of applying a sequence of infinitesimal rotations 
to the state vector in Hilbert space M . It seems that sim- 
ply analysing how we gain empirical information, what 
we can say from it about expected future information, 
and not succumbing to the lure of the question what 
is behind this information, can give us a basis for do- 
ing physics. This confirms the operational approach to 
science. And it is in support of Wheeler's It- from- Bit 
hypothesis ||, Weizsacker's ur-theory ||, Eddington's 
idea that information increase itself defines the rest [10|, 
Anandan's conjecture of absence of dynamical laws [11|, 
Bohr and Ulfbeck's hypothesis of mere symmetry ]l2]~or 
the recent 1 Bit — 1 Constituent hypothesis of Brukncr 
and Zeilinger ]T^] . 

In view of the analysis presented here the quantum 
theoretical probability calculus is an almost trivial con- 
sequence of probability theory, but not as applied to 'ob- 
jects' or anything 'physical', but as applied to the naked 
data of probabilistic experiments. If we continue this 
idea we encounter a deeper problem, namely whether 
the space which we consider physical, this 3- or higher 
dimensional manifold in which we normally assume the 
world to unfurl (m) , cannot also be understood as a pecu- 
liar way of representing data. Kant conjectured this - in 
somewhat different words - over 200 years ago [H . And 
indeed it is clearly so, if we imagine the human observer 
as a robot who must find a compact memory represen- 



tation of the gigantic data stream it receives through its 
senses |p"(| . That is why our earlier example of the parti- 
cle in a box should only be seen as illustration by means 
of familiar terms. It should not imply that we accept 
the naive conception of space or things, like particles, 
'in' it, although this view works well in everyday life and 
in the laboratory — as long as we are not doing quan- 
tum experiments. We think that a full acceptance of the 
probabilistic paradigm as the basis of empirical science 
will eventually require an attack on the notions of spatial 
distance and spatial dimension from the point of view of 
optimal representation of probabilistic information. 

Finally, we want to remark on a difference of our anal- 
ysis to quantum theory. We have emphasized that the 
standard deviations of the ERs x an d V* become inde- 
pendent of the limits of these ERs only when we have in- 
finitely many trials. But there is a departure for finitely 
many trials, especially for values of p close to and close 
to 1. With some imagination this can be noticed in Fig.l 
in the top and bottom probability distributions of x, 
which are a little bit wider than those in the middle. 
But as we always have only finitely many trials, there 
should exist random variables which fulfill our require- 
ment for an ER even better than \ and ip. This implies 
that predictions based on these unknown random vari- 
ables should also be more precise! Whether we should 
see this as a fluke of statistics, or as a need to amend 
quantum theory is a debatable question. But it should 
be testable. We need to have a number of different prob- 
abilistic experiments, all of which are done with only very 
few trials. From this we want to predict the outcomes of 
another probabilistic experiment, which is then also done 
with only few trials. Presumably, the optimal procedure 
of prediction will not be the one we have presented here 
(and therefore not quantum theory). The difficulty with 
such tests is however that, in the usual interpretation of 
data, statistical theory and quantum theory are treated 
as separate, while one message of this paper may also be 
that under the probabilistic paradigm the bottom level 
of physical theory should be equivalent to optimal rep- 
resentation of probabilistic information, and this theory 
should not be in need of additional purely statistical the- 
ories to connect it to actual data. We are discussing this 
problem in a future paper ]Ti| . 
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FIG. 1. Functional relation between random variables v 
and Xi an d their respective probability distributions as ex- 
pected for N = 100 trials, plotted for five different values of p: 
.07, .25, .50, .75 and .93. The bar above each probablity dis- 
tribution indicates twice its standard deviation. Notice that 
the standard deviations of v differ considerably for different 
p, while those of x ar e all the same, as required in eq.(6) 
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FIG. 2. (a) Graphical construction of efficient random 
variable £ (and thereby of \) from the observed relative fre- 
quency v. ( is measured along the arc. (b) Similar con- 
struction of the efficient random variable j3. It is given by its 
coordinates in the complex plane. The quantum mechanical 
probability amplitude ip is the normalized case of /3, obtained 
by setting a = 1 and 6 = 0. 
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